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Summary 

A  fully  three-dimensional,  multivariate,  optimum-interpolation  ocean  data  assimilation  system  has  been 
developed  that  produces  simultaneous  analyses  of  temperature,  salinity,  geopotential  and  vector  velocity, 
The  system  is  run  in  real-time,  and  can  be  executed  as  a  stand-alone  analysis  or  cycled  with  an  ocean  fore¬ 
cast  model  in  a  sequential  incremental  update  cycle.  Additional  capabilities  have  been  buill  into  ihe  system, 
including  flow-dependent  background -error  correlations  and  background -error  variances  that  vary  in  space  and 
evolve  from  one  analysis  cycle  to  the  next.  The  ocean  data  types  assimilated  include:  remotely  sensed  sea  surface 
temperature,  sea  surface  height,  and  sea-ice  concentration;  plus  in  situ  surface  and  sub-surface  observations  of 
temperature,  salinity,  and  currents  from  a  variety  of  sources,  such  as  ships,  buoys,  expendable  bathythermographs, 
c on d u ct i y i t y-tc mperat u re-dep th  sensors,  and  profiling  floats.  An  ocean  data  quality-control  system  is  fully  inte¬ 
grated  with  the  multivariate  analysis,  and  includes  feedback  of  forecast  fields  and  prediction  errors  in  the  quality 
control  of  new  observations.  The  system  is  operational  at  the  US  Navy  oceanographic  production  centres  both 
in  global  and  in  regional  applications.  It  is  being  implemented  as  the  data  assimilation  component  of  the  Hybrid 
Coordinate  Ocean  Model  as  part  of  the  US  contribution  to  the  Global  Ocean  Data  Assimilation  Experiment,  and  in 
a  limited-area  ensemble- based  forecasting  system  that  will  be  used  in  an  adaptive  sampling,  targeted  observation 
application. 

Keywords:  Background  errors  Error  covariances  Forecasting  Observation  errors  Quality  control 
Validation 


1.  Introduction 

The  purpose  of  this  paper  is  to  provide  an  overview  of  the  ocean  data  assimila¬ 
tion  system  in  operational  use  at  the  US  Navy  oceanographic  production  centres:  Fleet 
Numerical  Meteorology  and  Oceanography  Center  (FNMOC)  and  the  Naval  Oceano¬ 
graphic  Office  (NAVQCEANG),  The  analysis  system  is  referred  to  as  the  Navy  Coupled 
Ocean  Data  Assimilation  (NCODA),  and  was  developed  as  part  of  the  Naval  Research 
Laboratory  coupled  modelling  projects  sponsored  by  the  Office  of  Naval  Research. 
The  analysis  can  be  used  in  a  variety  of  ways  in  operations,  supporting  both  global 
and  regional  applications.  The  analysis  can  be  executed  in  two-dimensional  (2D)  mode 
to  provide  sea  surface  temperature  (SST)  and  sea  ice  concentration  lower-boundary 
conditions  for  the  Navy  global  and  regional  atmospheric  forecast  models,  or  it  can  be 
run  in  3D  mode  to  provide  a  stand-alone  analysis  of  ocean  temperature,  salinity,  and 
geostrophic  currents.  Alternatively,  the  analysis  can  be  cycled  with  an  ocean  forecast 
model  in  a  sequential  incremental  update  cycle,  providing  updated  initial  conditions 
for  the  next  ocean  model  forecast  run.  The  analysis  background,  or  first-guess,  fields 
are  generated  from  a  short-term  ocean  model  forecast  or  from  a  previous  analysis. 
The  analysis  computes  corrections  to  the  first-guess  fields  using  all  of  the  observations 
that  have  become  available  since  the  last  analysis  was  made.  NCODA  has  been  cycled 
with  a  variety  of  ocean  forecast  models,  including  H  YCOM  (Hybrid  Coordinate  Ocean 
Model),  NCOM  (Navy  Coastal  Ocean  Model),  POP  (Parallel  Ocean  Prediction  model), 
and  SWAPS  (Shallow  Water  Analysis  Forecast  System  based  on  the  Princeton  Ocean 
Model).  The  system  also  supports  globally  re-locatable,  multi-scale  analyses  on  nested, 
successively  higher-resolution  grids  using  a  3:1  nested  grid  ratio.  This  nesting  strategy 
is  of  particular  importance  in  Navy  applications  where  very  high  resolution  is  required 
in  a  rapid  environmental  assessment  mode  of  operation. 

*  Corresponding  address:  Oceanography  Division,  Naval  Research  Laboratory,  Monterey,  CA  93943,  USA. 
e-mail:  james.cummi  ngs@nrlmry.  navy  mi  I 
©  Royal  Meteorological  Society,  2005. 
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The  examples  used  in  the  paper  are  taken  from  a  global  3D  analysis  running  in 
operational  mode  at  FNMOC,  which  is  described  further  in  section  6.  Validation  of 
the  analysis  system  cycling  with  an  ocean  forecast  model  is  the  subject  of  a  follow- 
on  paper.  Here,  sections  2  and  3  describe  the  assimilation  method  and  the  techniques 
used  to  specify  the  error  covariances.  Section  4  lists  the  ocean  observing  systems 
assimilated  and  outlines  the  quality  control.  Sections  5  and  6  summarize  the  validation 
and  verification  features  built  into  the  analysis  system,  and  provide  a  description  of  the 
current  operational  runs  and  future  applications. 

2.  Method 

The  method  used  in  NCODA  is  an  oceanographic  implementation  of  the  multi¬ 
variate  optimum  interpolation  (MVOI)  technique  widely  used  in  numerical  weather 
prediction  systems.  A  complete  derivation  and  description  of  the  MVOI  method  is 
provided  in  Daley  (1991),  with  application  to  atmospheric  systems  given  in  Lorenc 
(1981)  and  Goerss  and  Phoebus  (1992).  The  ocean  analysis  variables  are  temperature, 
salinity,  geopotential  (dynamic  height)  and  velocity.  All  ocean  variables  are  analysed 
simultaneously  in  three  dimensions.  The  horizontal  correlations  are  multivariate  in 
geopotential  and  velocity,  thereby  permitting  adjustments  to  the  mass  fields  to  be  corre¬ 
lated  with  adjustments  to  the  flow  fields.  The  velocity  adjustments  (or  increments)  are  in 
geostrophic  balance  with  the  geopotential  increments  which,  in  turn,  are  in  hydrostatic 
agreement  with  the  temperature  and  salinity  increments. 

The  MVOI  problem  is  formulated  as: 

xa  =  xb  +  PbHT(HPbHT  +  R)~l(y  -  H(xb)},  (1) 

where  xa  is  the  analysis  vector,  xb  is  the  background  vector,  Pb  is  the  background- 
error  covariance  matrix,  H  is  the  forward  operator,  R  is  the  observation  error  covari¬ 
ance  matrix,  and  y  is  the  observation  vector.  The  observation  vector  contains  all  of  the 
synoptic  temperature,  salinity  and  velocity  observations  that  are  within  the  geographic 
and  time  domains  of  the  forecast  model  grid  and  update  cycle.  Observations  can  be 
assimilated  at  their  measurement  times  within  the  update-cycle  time  window  by  compar¬ 
ison  against  time-dependent  background  fields  using  the  first-guess  at  appropriate  time 
(FGAT)  method.  An  advantage  of  the  FGAT  method  is  that  it  eliminates  a  component 
of  mean  analysis  error  that  occurs  when  comparing  observations  and  forecasts  not  valid 
at  the  same  time. 

A  forward  model  is  a  method  of  converting  a  forecast  model  variable  to  an 
observed  variable.  The  MVOI  does  not  explicitly  include  any  forward  models,  and  only 
analyses  observations  that  are  of  the  same  variable  as  the  forecast  model.  The  forward 
operator  used  here  is  spatial  interpolation  of  the  forecast  model  grid  to  the  observation 
location  performed  in  three  dimensions.  Thus,  HPbHT  is  approximated  directly  by  the 
background-error  covariance  between  observation  locations,  and  PbHT  directly  by  the 
error  covariance  between  observation  and  grid  locations.  For  the  purposes  of  discussion, 
the  quantity  {y  -  H(xb))  is  referred  to  as  the  innovation  vector,  (y-H(x)}  is  the 
residual  vector,  and  xa  -  xb  is  the  increment  (or  correction)  vector.  The  mix  of  variables 
with  different  units  in  a  multivariate  analysis  requires  that  the  analysis  variables  be 
dimensionless.  Accordingly,  prior  to  an  analysis  the  innovation  vector  is  normalized 
by  the  background  error  at  the  observation  locations,  and  after  an  analysis  the  increment 
vector  is  scaled  by  the  background  error  at  the  grid  locations. 

The  solution  of  (1)  is  carried  out  via  the  overlapping  volume  approach  of  Lorenc 
(1981),  with  some  new  capabilities  to  allow  the  analysis  to  interface  with  any  grid. 
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A  total  of  eight  volume  solutions  are  computed  for  each  analysis  grid  point.  The  different 
solutions  are  weighted  by  grid  point  distance  from  volume  centre  when  forming  the 
final  analysis  estimate,  Volume  size  is  a  function  of  the  local  correlation  length-scale  at 
volume  centre,  and  includes  eight  correlation  length-scales  if  the  number  of  observations 
in  the  volume  does  not  exceed  a  threshold  value  (>7200),  in  which  case  the  volume  is 
subdivided.  Note  that  volume  subdivision  rarely  occurs  in  practice.  The  combination  of 
overlapping  volumes  and  a  large  number  of  correlation  length-scales  within  a  volume, 
produces  analysis  increments  that  are  very  smooth  with  no  seams  along  volume  edges, 
and  reduces  the  departure  from  geostrophy  that  occurs  when  interpolating  different 
solutions. 


3.  Error  covariances 

Specification  of  the  background-  and  observation-error  covariances  in  the  analysis 
is  very  important.  The  background-error  covariances  are  separated  into  a  background- 
error  variance  and  a  correlation.  The  correlation  is  further  separated  into  a  horizontal 
(Ch)  and  a  vertical  (Cv)  component.  All  correlations  of  scalar  variables  are  modelled  as 
second  order  auto-regressive  (SOAR)  functions  of  the  form: 

Ch  =  (1  +  Jh)  exp(-jh) 

Cv  =  (1  +  sv)  exp(-sv),  ^ 

where  Jh  ar,d  jv  are  the  horizontal  and  vertical  distances  between  observations  or  be¬ 
tween  observations  and  grid  points,  normalized  by  the  geometric  mean  of  the  horizontal 
and  the  vertical  correlation  length-scales  at  the  two  locations.  The  horizontal  correlation 
length-scales  vary  with  location  and  depth  in  the  analysis,  and  the  vertical  correlation 
length-scales  vary  with  depth. 

(a)  Horizontal  correlations 

Default  horizontal  correlation  length-scales  are  specified  as  the  first  baroclinic 
Rossby  radius  of  deformation  computed  from  the  historical  profile  archive  (Chelton 
et  al.  1998).  The  Rossby  length-scales  vary  from  10  km  at  the  poles  to  greater  than 
200  km  in  the  Tropics.  Rossby  length-scales  can  be  further  modified  by  a  propor¬ 
tionality  constant  that  is  set  to  the  ratio  of  a  regionally  averaged  correlation  length- 
scale  computed  from  an  innovation  time  series  using  the  innovation  correlation  method 
(Hollingsworth  and  Lonnberg  1986;  see  section  5  below)  and  the  corresponding  region¬ 
ally  averaged  Rossby  length-scales.  Proportionality  constants  computed  in  this  way  are 
on  the  order  of  1.3  to  2.8,  with  small  latitude-dependence.  Alternatively,  horizontal 
correlation  length-scales  can  be  input  directly  into  the  analysis,  thereby  bypassing  the 
default  specifications  based  on  Rossby  radius  of  deformation  scales.  This  option  is 
useful  when  the  horizontal  correlation  length-scales  and  other  covariance  parameters 
required  by  the  analysis  are  computed  using  ensemble  methods,  as  is  described  in 
section  6. 

Flow-dependence  is  introduced  in  the  analysis  by  scaling  the  horizontal  and  ver¬ 
tical  correlations  with  a  correlation  computed  from  the  geopotential  height  difference 
between  two  locations.  The  flow-dependent  correlation  (Cf)  is  computed  using  a  SOAR 
model,  and  the  total  background-error  correlation  (Cb)  is  then  computed  as  the  product 
of  all  three  correlation  components  according  to: 

Cf=  (1  +$f)  exp(-jf) 

Cb  =  CbCyCf, 


(3) 
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Figure  ].  Analysed  temperature  increments  fdegC)  at  100  m  depth  in  the  Kuroshio  Extension  (colour  filled 
contours).  The  increment  field  is  extracted  from  a  global  three-dimensional  multivariate  optimum  interpolation 
analysis  valid  6  August  2005,  Overlay  of  dynamic  height  (0/1000  db)  contours  (contour  interval  0,2  dynamic 
metres)  valid  5  August  2005,  used  to  compute  the  flow-dependent  correlations  in  the  analysis  (geopo ten tial  length- 
scale  =  0.2  dynamic  metres).  Profile  observation  locations  in  the  analysis  are  marked  using  a  plus  symbol  for 
Argo  floats  and  conductivity  temperature  depth  casts,  and  asterisks  for  Modular  Ocean  Data  Assimilation  System 

synthetics. 


where  S(  is  the  geopotential  height  difference  at  two  locations  normalized  by  a  specified 
geopotential  length-scale  (As),  and  Ch  and  Cv  are  as  described  above.  A  small  (large) 
value  of  ha  will  produce  strong  (weak)  llow-dependence  in  the  analysis  increments. 
The  flow-dependent  correlations  tend  to  spread  the  innovations  along  rather  than  across 
geopotential  contours.  This  is  a  desirable  outcome,  since  error  correlations  across  an 
ocean  front  are  expected  to  be  characteristically  smaller  than  error  correlations  along 
the  front.  An  example  of  this  calculation  is  shown  in  Fig.  I,  which  shows  the  flow- 
dependent  analysis  increments  from  a  global  3D  MVOI  analysis  in  the  vicinity  of  the 
Kuroshio  Extension  western  boundary  current  front  east  of  Japan,  and  the  geopotential 
background  field  valid  the  previous  day  that  was  used  to  compute  the  flow-dependent 
correlations.  The  temperature  increments  are  clearly  constrained  by  the  meanders  of 
the  Kuroshio  front  as  it  leaves  the  coast,  and  a  strong  cold-core  eddy  south  of  the 
front.  A  potential  drawback  to  this  method  is  that  the  flow -dependent  correlations  are 
computed  directly  from  the  forecast  model  height  fields,  thus  they  depend  strongly  on 
the  accuracy  of  the  model  forecast.  This  is  equivalent  to  an  assumption  of  a  perfect 
model,  and  may  not  prove  to  be  very  useful  in  practice  if  the  forecast  model  fields  are 
inaccurate.  Accordingly,  flow-dependence  can  be  switched  off,  or  hs  can  be  set  to  a 
relatively  large  value,  to  prevent,  or  minimize,  a  mode!  forecast  with  systematic  error 
from  adversely  affecting  the  analysis, 
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(b)  Vertical  correlations 

A  variety  of  options  are  available  in  the  analysis  to  specify  the  vertical  correlation 
length-scales.  Vertical  correlation  length-scales  can:  (i)  be  constant  (or  zero),  (ii)  mono- 
tonically  increase  or  decrease  with  depth,  or  (iii)  vary  with  background  density  vertical 
gradients.  In  the  latter  option,  a  change  in  density  stability  criterion  is  used  to  define 
a  well-mixed  layer.  The  change  in  density  criterion  is  then  scaled  by  the  background 
vertical  density  gradient  according  to: 

hv  =  A/Cfr/dz),  (4) 

where  /iv  is  the  vertical  correlation  length-scale,  is  the  change  in  density  criterion, 
and  3p/3z  is  the  vertical  density  gradient.  The  resulting  vertical  correlation  length- 
scales  vary  with  depth  and  are  large  (small)  when  the  water  column  stratification  is  weak 
(strong).  The  vertical  correlations  are  computed  each  update  cycle  from  the  background 
density  fields.  This  process  allows  the  vertical-scales  to  evolve  from  one  analysis  cycle 
to  the  next  to  capture  changes  in  mixed  layer  and  thermocline  depths. 

(c)  Multivariate  correlations 

The  horizontal  correlation  functions  described  above  are  used  in  the  analysis  of 
temperature,  salinity  and  geopotential,  and  all  of  these  variables  are  assumed  to  have  the 
same  background-error  correlations.  The  formulation  of  the  multivariate  background- 
error  correlations,  however,  is  derived  from  the  first  and  second  derivatives  of  the 
SOAR  model  function.  This  form  requires  the  calculation  of  the  angles  between  the 
two  locations  and  the  specification  of  a  parameter  v,  which  measures  the  divergence' 
permitted  in  the  velocity  correlations,  and  a  parameter  fi,  which  specifies  the  strength 
of  the  geostrophic  coupling  of  the  velocity/geopotential  correlations.  Typically,  v  is  set 
to  a  constant  small  value  (v  =  0.05)  that  does  not  vary  with  location.  This  setting  will 
produce  velocity  increments  that  are  weakly  divergent,  and  assumes  that  the  divergence 
is  not  correlated  with  changes  in  the  mass  field.  Parameter  fi,  however,  does  vary 
from  0  to  1  with  location.  It  is  scaled  to  zero  within  2°  of  latitude  from  the  equator 
where  the  /-plane  geostrophic  equation  is  singular,  and  in  shallow  water  <50  m  deep 
where  friction  rather  than  pressure  gradient  forces  controls  ocean  (low.  As  mentioned 
previously,  a  full  derivation  of  the  multivariate  horizontal  correlations  is  provided  in 
Daley  (1991).  Using  the  derivatives  of  the  geopotential  SOAR  correlation  model  and 
converting  from  natural  to  rectangular  coordinates,  the  correlation  functions  of  the 
possible  combinations  of  geopotential  and  velocity  are  shown  in  Fig.  2.  There  are  nine 
possible  combinations,  but  for  clarity  only  one  form  of  the  cross-correlations  between 
geopotential  and  the  velocity  components  is  shown. 

(rf)  Background-error  variances 

Background-error  variances  are  poorly  known  in  the  ocean  and  are  likely  to  be 
strongly  dependent  on  model  resolution  and  other  factors,  such  as  atmospheric  model 
forcing  errors  and  ocean  model  parametrization  errors.  In  the  analysis,  the  background- 
error  variances  {e\)  vary  with  location,  depth,  and  analysis  variable,  and  are  computed 
prior  to  an  analysis  from  a  time  history  of  the  analysed  increment  fields  according  to: 

£  =  exp(-r/rc)2 

el  =  P  ‘  (  Yj  W*(5ia  -  x^k  +  w«+l  «xa  -  xb)2))  +  (I  —  j8)  ■  <rb2, 


(5) 
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Figure  2.  Auto-  and  cross -cone] aliens  of  horizontal  multivariate  correlation  functions  for  gcopotcntial  (p\ 
u  vector  component  of  ocean  current  velocity  (u).  and  v  vector  velocity  (u).  Warm  (cool)  colours  indicate  positive 
(negative)  correlations,,  The  multivariate  correlation  functions  have  been  computed  with  the  geostrophie  coupling 
parameter  set  to  1.0  and  the  divergence  parameter  set  to  0,1  (see  text  for  details). 


where  c  refers  to  a  fixed  climate  time-scale,  xa  -  xb  is  the  increment  vector  (indices 
indicating  grid  location  and  depth  are  omitted  for  clarity),  w  is  the  weight  vector, 
(...)  indicates  a  long-term  mean  increment  vector  over  all  update  cycles  into  the 
past  older  than  the  number  of  recent  update  cycles  n\k  is  the  update  cycle  index,  r  is 
the  age  of  the  data  (r  >  0),  rc  is  an  integral  time-scale,  ab2  is  the  expected  variance 
of  the  analysis  variable,  and  is  a  factor  that  combines  the  error  contributions  from 
the  increments  and  expectations  based  on  age  of  the  data  on  the  grid.  The  first  and 
second  terms  on  the  right-hand  side  of  (5)  compute  the  background  error  from  the 
analysis  increment  fields,  with  recent  increments  more  heavily  weighted  than  errors 
accumulated  over  many  update  cycles.  Elements  of  the  increment  vector  must  exceed  a 
minimum  threshold  value  |xa  —  xb|  >  &  in  order  to  be  used  in  the  summations,  where  the 
magnitude  of  $  depends  on  the  analysis  variable.  Weight  vector,  w*,  is  computed  using 
a  geometric  series,  =  (1  -  0)*_1,  where  <f>  is  a  tunable  constant  between  0  and  1 
{typically  set  to  0.2),  and  normalized  such  that  the  weighted  averages  are  unbiased. 
The  third  term  on  the  right-hand  side  of  (5)  allows  the  background-error  variances  to 
increase  with  time  in  the  long-term  absence  of  observations  until  the  errors  asymptote  at 
the  limit  of  the  expected  variance  (<rb ),  specified  as  either  climate  variability  or  model 
error.  If  available,  model  error  is  computed  from  a  multi-year  time  series  of  differences 
between  free  running  model  states  at  the  analysis  update  cycle  time  interval. 
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The  relaxation  to  an  expected  error  variance  is  needed  because  not  all  analysis  grid 
locations  are  corrected  in  every  update  cycle,  due  to  sampling  limitations  of  the  ocean 
observing  systems.  In  order  to  distinguish  between  a  zero  increment  indicating  a  perfect 
model  forecast  or  simply  no  data,  the  age  of  the  data  on  the  grid  (t)  is  computed.  Age  of 
the  data  is  defined  as  the  number  of  hours  since  a  grid  location  has  been  influenced  by  an 
observation.  It  can  be  negative,  indicating  the  influence  of  observations  younger  than  the 
analysis  time.  Observation  age  innovations  are  computed  as  the  difference  between  the 
observation  time  and  the  valid  time  of  the  analysis,  and  assimilated  as  an  uncorrelated 
scalar  using  the  same  error  correlations  as  the  mass  variables.  The  background  age  field 
is  increased  by  the  number  of  hours  in  the  update  cycle  at  the  beginning  of  an  analysis 
and  can  only  be  reduced  by  the  assimilation  of  synoptic  data.  Time-scales  (rc)  specified 
independently  for  each  analysis  variable  are  used  to  normalize  age  of  the  data  on  the 
grid.  The  time-scales  range  from  MO  days  for  surface-  and  mixed-layer  variables  to 
MO  days  for  variables  at  depth.  Evaluation  of  time-scales  computed  using  a  relationship 
between  the  modified  Rossby  length-scales  (<f)  and  current  speed  (j),  given  by  rc  =  d/s, 
is  underway.  Here,  time-scales  vary  with  location  and  depth,  and  range  from  a  few  days 
near  the  surface  in  high  speed,  western  boundary  current  regions,  to  ~7  years  in  the 
deep  ocean  where  current  speeds  are  slow, 

In  practice,  the  background -error  variances  evolve  to  a  quasi-steady  state  over  time. 
Figure  3  shows  examples  of  the  background  error  and  age  of  the  data  on  the  grid  for 
SST,  altimeter  sea  surface  height  anomaly  (SSHA),  and  temperature  at  400  m  depth, 
from  a  global  3D  MVOI  analysis  cycled  from  20  June  to  1  November  2005  using  a 
24-hour  update  cycle.  The  number  of  increment  fields  into  the  past  and  time-scales  used 
in  the  summations  are  10  days  for  SST,  20  days  for  SSHA,  and  30  days  for  temperature 
at  depth.  SST  data  age  shows  that  the  SST  field  is  well  constrained  by  observations, 
with  the  result  that  SST  background  errors  are  primarily  a  function  of  the  analysis 
increments.  SSHA  data  age  shows  the  characteristic  diamond-shaped  data  void  areas 
between  altimeter  satellite  tracks,  and  data  age  at  400  m  shows  numerous  areas  that 
have  not  been  observed  for  4  weeks  or  more,  such  as  the  south-east  Pacific  and  eastern 
Atlantic.  SSHA  and  temperature  background  errors  at  depth  are,  therefore,  a  mix  of 
analysis  increment  errors  and  climate  errors,  due  to  the  lack  of  observations  in  some 
locations  for  extended  time  periods.  As  expected  from  an  analysis  cycling  on  itself,  the 
background  errors  tend  to  reflect  variability  associated  with  western  boundary  currents. 
However,  the  pattern  and  the  magnitude  of  the  background  errors  are  expected  to  be 
very  different  when  the  analysis  is  cycled  with  an  ocean  forecast  model  as  a  first-guess. 
Ocean  mesoscale  variability  will  be  increased  by  contributions  from  atmospheric  forcing 
and  ocean  model  errors,  and  reduced  by  ocean  model  forecast  skill. 

The  adaptive  scheme  implemented  here  is  designed  to  provide  background  errors 
that;  (i)  are  appropriate  for  the  time  interval  at  which  data  are  inserted  into  the  model, 
(it)  are  coherent  with  the  variance  of  the  innovation  time  series,  (iii)  reflect  the  variable 
skill  of  the  different  ocean  forecast  models  that  are  used  with  the  analysis  system, 
(iv)  adjust  quickly  to  new  ocean  areas  when  the  analysis  is  re-located  in  a  rapid 
environmental  assessment  mode  of  operation,  and  (v)  can  be  computed  when  the 
analysis  is  cycled  on  itself  or  with  a  forecast  model.  One  difficulty  with  using  analysis 
increments  to  compute  background-error  variances  is  that  the  increments  contain  a 
mixture  of  forecast  and  analysis  error.  Analysis  errors  result  from  the  fact  that  the 
statistical  parameters  used  in  the  analysis  represent  expected  values,  and  are  unlikely 
to  be  correct  at  all  places  and  at  all  times.  While  the  analysed  increment  approach  may 
not  represent  all  aspects  of  the  background  error,  the  method  does  provide  a  reasonable 
measure  of  the  spatial  structure  of  the  background  errors.  Inaccuracies  in  the  magnitudes 
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Figure  3,  Background -error  standard  deviations  and  age  of  the  data  on  the  grid  from  a  global  three-dimensional 
multivariate  optimum  interpolation  analysis  valid  I  November  2005:  (a)  sea  surface  height  anomaly  (SSHA) 
background  errors  (in);  (b)  SSHA  data  age;  (c)  sea  surface  temperature  (SST)  background  errors  (degC);  (d)  SST 
data  age;  (e)  temperature  errors  at  400  m  (tlegC);  (0  age  of  the  data  at  400  m.  Ages  are  given  in  hours.  While  areas 
represent  ice  covered  seas  or  grid  locations  shallower  than  400  m+  Note  differences  in  the  colour  scales  between 

panels. 


of  ihe  background-error  variances  are  adjusted  with  the  7min  diagnostic  described  in 
section  5. 


(e)  Observation-error  variances 

The  observation  errors  and  the  background  errors  are  assumed  to  be  uncorrelated, 
and  errors  associated  with  observations  made  at  different  locations  and  at  different  times 
are  also  assumed  to  be  uncorrelated.  As  a  result  of  these  assumptions,  the  observation- 
error  covariance  matrix  R  is  set  equal  to  1  +  along  the  diagonal  and  zero  elsewhere, 
where  E ^  represents  observation -error  variances  normalized  by  the  background-error 


OPERATIONAL  MULTIVARIATE  OCEAN  DATA  ASSIMILATION 


3591 


variances  interpolated  to  the  observation  location.  (Recall  from  section  2  that  the 
innovation  vector  in  (1)  is  normalized  by  e£).  This  noise-to-signal  ratio  will  vary  from 
one  update  cycle  to  the  next,  since  the  background  errors  and  the  components  of  some 
of  the  observation  errors  change  with  time  and  location  in  the  analysis. 

Observation  errors  are  computed  as  the  sum  of  a  measurement  error  (<?^ )  and  a 
representation  error  (ej?).  Measurement-error  variances  are  specified  as  input  parameters 
in  the  analysis  with  one  exception  that  is  described  below.  Measurement  errors  reflect 
the  accuracy  of  the  instruments  and  the  ambient  conditions  in  which  they  operate.  These 
sources  of  error  are  fairly  well  known  in  the  ocean  for  many  observing  systems,  although 
the  magnitudes  of  the  measurement  errors  can  change  in  time  due  to  calibration  drift 
of  the  instruments  and  other  factors.  Some  observing  system  measurement  errors  are 
now  included  in  the  real-time  data  stream  by  the  data  providers.  For  example,  satellite 
SST  retrieval  errors  are  routinely  computed  using  a  sliding  time  window  of  space- 
time  collocations  of  drifting  buoy  measurements  of  SST  and  satellite  SST  retrievals. 
These  error  estimates  vary  with  the  satellite  platform,  retrieval  algorithm  and  time.  Data 
sources  that  have  fixed  specifications  of  measurement  error  are  listed  in  Table  I . 

Representation  errors,  however,  are  a  function  of  the  resolutions  of  the  model  and 
of  the  observing  network,  and  are  much  more  difficult  to  quantify.  A  satellite  retrieval 
representation  error  is  computed  when  the  resolution  of  the  retrieval  (rs)  exceeds  the 
resolution  of  the  grid  (rg)  according  to: 

er-sai  =  '  (r,s/  r g) ,  (6) 

where  er-sat  is  the  representation  error  of  the  satellite  retrievals,  Vf  is  the  gradient  of 
the  background  field  at  the  retrieval  location,  and  rs  >  rg.  Analysis  grid  resolutions 
commonly  used  in  operations  are  ~9-l8  km  (section  6),  which  results  in  an  increase 
of  the  representation  errors  of  microwave  (25  km)  and  Geostationary  Operational 
Environmental  Satellite  (GOES)  SST  (12  km),  and  Special  Sensor  Microwaveflmager 
(SSM/I)  sea  ice  (25  km)  retrievals  in  high-gradient  regions.  Other  satellite  data  with 
smaller  footprint  sizes,  such  as  altimeter  SSHA  (7  km)  and  infrared  satellite  SST  (8  km), 
are  not  affected.  Representation  error  of  temperature  and  salinity  profile  observations  is 
assumed  to  be  dependent  on  the  mesoscale  signal  and  an  uncertainty  associated  with 
internal  wave  activity.  Using  climate  variability  as  a  proxy  for  the  mesoscale  signal,  and 
the  observed  vertical  gradient  as  a  proxy  for  the  aliasing  error  associated  with  internal 
waves,  profile  temperature  (er-tmp)  and  salinity  (er.sai)  representation  errors  are  computed 
by: 

et-mp  —  ktot  -I-  Xr(d7ydz) 

*r.sal  =  KS°S  +  AS(d5/dz),  ^ 

where  aj  and  as  are  the  climate  temperature  and  salinity  standard  deviations  at  the 
observation  location  and  sampling  time,  and  dT/dz  and  dS/dz  are  the  observed  tem¬ 
perature  and  salinity  vertical  gradients.  The  constants  kj.s  and  Ay, 5  are  determined 
empirically;  they  are  currently  set  to  0.01  and  0.3,  respectively,  for  both  temperature 
and  salinity.  Since  observing  system  representation  error  is  so  poorly  known,  it  is  im¬ 
plemented  as  a  tunable  parameter  in  the  analysis.  The  ymjn  diagnostic  (described  in 
section  5)  is  used  to  determine  changes  needed  in  the  magnitude  of  the  representation 
error. 

The  exception  to  the  off-line  specification  of  the  observation-error  variances  is  that 
of  the  geopotential  observations.  Geopotential  is  computed  during  an  analysis  from 
the  temperature  and  salinity  observations,  by  integrating  the  specific-volume  anomaly 
(Fofonoff  and  Millard  i983)  from  a  level  of  no  motion  to  the  surface.  The  errors,  ea. 
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TABLE  l.  Global  operational  ocean  observation  data  sources 


Data  source 

Specifications 

Estimated  number  of 
daily  observations 

Measure  men  t-eiror 
standard  deviations 

AVHRR  GAC1  Infrared 
Satellite  SST 

8  km  day,  night,  relaxed 
day  retrievals 

NOAA-16,  17,  18 

1  200000 

variable 

AVHRR  LAC2  Infrared 
Satellite  SST 

2  km  day,  night  retrievals 
NOAA-16,  17,  18 

4000000 

variable 

GOES  Infrared  Satellite 
SST 

12  km  day,  mghL  retrievals 
GOES- 10,  12 

5000000 

variable 

Microwave  Satellite 

SST 

AMSR-E  25  km  retrievals 

5000000 

variable 

The  rmosali  nograph 

in  situ  SST,  SSS3 

1800 

1.1  degC,  0.5  PS  U 

Sea  surface  height 

Satellite  altimeters 

150000 

anomaly 

(Jason,  GFO4  and  Envlsat) 

0,03,0.08,0.08  m 

T  and  S  profiles 

300 

0.01  m 

Sea  ice  concentration 

SSM/1  25  km  retrievals 
DMSP5  FI 3,  F14.FI5 

t  200000 

5% 

Ship  SST 

Engine  room  intake 

4000 

1 .3  degC 

Hull  contact  sensor 

800 

0.6  dcgC 

Bucket  temperature 

200 

1.2  dcgC 

Buoy  SST 

Fixed 

6000 

0.05  degC 

Drifting 

32000 

0.1 2  dcgC 

CMAN6  Stations 

in  situ  SST 

120 

U  degC 

XBT 

Temperature  profiles 

100 

0,1 2  degC 

Argo  Floats 

Temperature  and 

Salinity  profiles 

200 

0.002  dcgC 

0.0 1  PSU 

CTD7  Stations 

Tempo  rain  ic  and 

50 

0.002  dcgC 

(TESACss) 

Salinity  profiles 

0.0 1  PSU 

Drifting  Buoy 

Temperature  profiles 

700 

0,12  dcgC 

Fixed  Buoy 

Temperature  and 

Salinity  profiles 

1000 

0.003  degC 

0.02  PSU 

1  Global  area  coverage, 

2  Local  area  coverage, 

*  Sea  surface  salinity. 

4  Geosat  Follow  On. 

5  Defense  Meteorological  Satellite  Program  (USA). 

6  Coastal  Marine  Automated  Network. 

1  Conductivity  Temperature  Depth, 

s  Temperature,  salinity  and  currents;  a  type  of  WMO  message  form. 
See  text  for  other  details. 


in  specific-volume  anomaly,  a  are  computed  from  the  errors,  e#  and  e$ ,  of  the  potential 
temperature  and  salinity,  S,  observations,  using  the  partial  derivatives  of  the  equation 
of  state  with  respect  to  9  and  S  according  to: 

ea  =  (da/d9)e()  +  (dafdS)es*  (8) 

The  specific-volume  anomaly  errors  are  then  integrated  through  the  water  column  from 
the  user-defined  level  of  no  motion  to  the  surface  in  the  same  way  as  the  geopotential 
itself  is  integrated.  Geopotential  errors  at  and  below  the  level  of  no  motion  are  set  to  a 
small  value. 
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4.  Ocean  observations 

The  analysis  makes  full  use  of  all  sources  of  operational  ocean  observations. 
The  ocean  observing  systems  currently  assimilated  are  listed  in  Table  I ,  along  with 
typical  global  data  counts  per  day  for  the  different  observing  system  categories. 
New  data  sources  are  continually  being  added  to  the  analysis,  such  as  surface  velocity 
observations  from  high-frequency  coastal  radar  installations.  In  addition  to  satellite 
altimeters,  SSHA  observations  computed  from  in  situ  profiles  of  temperature  and 
salinity  are  used  as  a  source  of  data  in  the  SSHA  analysis.  The  in  situ  SSHA  observations 
constrain  the  extrapolation  of  the  satellite  altimeter  sea  surface  height  (SSH)  data  by 
the  analysis  into  the  diamond  shaped  areas  that  otherwise  are  not  observed  by  nadir 
sampling  altimeters.  This  feature  is  important  when  the  satellite  altimeter  data  assimi¬ 
lated  are  from  a  short  repeat  cycle  mission  (-M0  days)  with  large  distances  (~300  km) 
between  adjacent  tracks. 

(a)  Quality  control 

All  ocean  observations  are  subject  to  data  quality -control  (QC)  procedures  prior 
to  assimilation.  The  need  for  QC  is  fundamental  to  a  data  assimilation  system. 
Accepting  erroneous  data  can  cause  an  incorrect  analysis,  while  rejecting  extreme  but 
valid  data  can  miss  important  events.  It  is  likely  that  decisions  made  at  the  QC  step  affect 
the  success  or  failure  of  the  entire  analysis/forecast  system.  A  complex  QC  process  is 
used  in  which  an  observation  is  not  rejected  as  soon  as  it  fails  an  individual  QC  test. 
Rather,  each  observation  is  subjected  to  a  series  of  tests,  with  the  final  QC  decision 
based  on  consideration  of  all  of  the  QC  test  results.  The  real-time  QC  system  described 
in  Cummings  (2006)  is  summarized  here. 

A  sequence  of  gross-error  data  checks  are  performed  first,  which  include  a  land- 
sea  boundary  test,  valid-value  range  tests,  and  a  location  (speed)  test  for  platforms  that 
report  unique  call  signs.  Next,  a  series  of  instrumentation  error  checks  are  performed 
on  expendable  bathythermographs  and  profiling  floats;  sensor  drift  in  fixed  and  drifting 
buoys  is  checked,  and  a  test  for  aerosol  contamination  is  performed  on  satellite  SST 
retrievals.  Cross  validation  checks  are  also  performed  to  ensure  the  consistency  of 
observations  within  and  between  analysis  variables.  In  the  within-variable  consistency 
check,  an  optimum  interpolation  (01)  analysis  is  performed  at  each  of  the  newly  received 
observation  locations  and  sampling  times  based  on  nearby  valid  data,  excluding  the 
datum  being  checked,  using  innovations  computed  from  climatology.  The  uncertainty 
of  the  analysed  value  is  computed  from  the  01  reduction  of  climate  error  due  to  the 
introduction  of  nearby  observations.  In  the  absence  of  any  nearby  data,  the  consistency 
check  simply  returns  climatology  and  climate  variability  as  the  analysis  estimates. 
The  cross  validation  analysed  value  and  its  uncertainty  are  used  in  the  background- 
field  check  described  below.  The  cross  validation  of  profile  observations  works  well 
in  practice  because  of  the  continuing  development  of  the  Argo  profiling  float  array 
(Argo  Science  Team  1998)  and  the  large  number  of  high  quality,  deep  profiles  that  are 
available  to  use  in  the  procedure.  Cross  validation  consistency  checks  are  also  very 
useful  in  the  QC  of  altimeter  SSHA  observations,  since  those  data  tend  to  be  spuriously 
rejected  along  sequential  segments  of  altimeter  tracks  due  to  phase  errors  in  the  model 
background  fields.  Examples  of  cross  validation  consistency  checks  between  analysis 
variables  include:  SST  and  sea  ice  concentration  to  check  for  impossible  ice  conditions; 
and  wind  speed  and  daytime  satellite  SST  retrievals  to  check  for  biases  due  to  diurnal 
warming  skin  temperature  effects.  These  procedures  produce  integer- valued  QC  flags 
of  varying  levels  of  severity,  ranging  from  information-only  (<  100),  cautionary  (>  100) 
and  fatal  (>  1000). 
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The  most  important  QC  procedures  are  the  background-field  checks,  which  include 
climatology,  and  global  and  regional  analyses  or  short-term  forecasts.  The  background 
and  background-error  fields  closest  in  time  to  the  observation  sampling  time  are  inter¬ 
polated  to  the  observation  location.  The  probability  of  an  erroneous  value  is  calculated 
from  the  difference  between  observed  and  background  values,  assuming  an  unbiased 
normal  distribution  with  appropriate  climate-,  analysis-  or  prediction-error  standard 
deviations.  Histograms  and  formal  statistical  tests  show  that  the  innovations  are  nor¬ 
mally  distributed,  although  climate  innovations  tend  to  have  very  long  tails.  In  addition 
to  the  level-by-level  field  checks  described  above,  observed  profiles  are  compared  to 
profiles  extracted  from  the  various  background  fields  and  cross  validation  profiles  using 
a  profile-shape  QC  procedure,  This  procedure  computes  an  integrated  probability  of 
random  error  that  takes  into  account  level-thicknesses  when  comparing  observed  and 
predicted  values,  and  has  the  advantage  of  taking  an  overview  of  the  entire  profile. 
The  background-field  and  shape-error  probabilities  are  used  in  combination  with  the 
QC  flags  in  a  decision-making  algorithm  when  selecting  observations  for  the  assimila¬ 
tion.  (Quality  controlled  ocean  observations  used  in  the  FNMOC  MVOI  analyses  are 
available  on  the  US  GODAE  data  server  http://www.usgodae.org/  in  near  real-time.) 

Within  the  analysis  itself  a  final  QC  procedure  is  performed  to  remove  observations 
that  have  passed  the  error  checks  described  above  but  that  may  still  be  unacceptable 
to  the  analysis  background.  This  procedure  is  referred  to  as  the  innovation-error  check 
and  is  done  in  the  following  way.  Given  the  diagonal  of  HPt,HT  +  R  and  they  -  H(Xh) 
innovation  vector,  compute  the  normalized  innovations  as  d  =  diag(HPt,H’r  +  R)-1/2 
{y  —  H(xh)}.  The  elements  of  the  normalized  innovation  vector  over  many  realiza¬ 
tions  should  be  a  normal  distribution  with  a  standard  deviation  equal  to  one  if  the 
background-  and  the  observation-error  covariances  Ph  and  R  have  been  properly 
specified.  Assuming  this  to  be  the  case,  tolerance  limits  (7jj  are  defined  to  remove  un¬ 
acceptable  observations.  The  tolerance  limits  are  equivalent  to  the  number  of  acceptable 
standard  deviations  of  the  innovation  from  the  analysis  background.  Since  Ph  and  R  are 
never  perfectly  known,  it  is  best  to  use  a  higher  tolerance  value  rather  than  a  lower  one 
in  this  procedure.  The  test  statistic  is  ‘reject  the  observation  if  d  >  7l\  where  TL  =  4  is 
specified. 


( b )  Pre-processing  of  ocean  observations 
Several  pre-processing  functions  are  performed  on  the  quality-controlled  ocean  ob¬ 
servation  datasets  prior  to  assimilation.  All  surface  data  types  are  reduced  in  number  by 
the  formation  of  ‘super-observations’.  Super-observations  are  innovations  averaged  into 
bins  that  are  dependent  on  the  grid  resolution  and  the  observation  data  type.  Thinning  of 
observations  is  a  necessary  step  in  the  analysis  in  order  to  remove  redundancies  in  the 
data  and  minimize  horizontal  correlations  among  observations.  The  super-observation 
algorithm  used  to  thin  data  in  the  analysis  is  adaptive.  As  the  model  grid  resolution 
increases,  the  actual  number  of  innovations  averaged  into  a  super-observation  decreases 
until,  eventually,  the  original  data  are  directly  assimilated.  This  feature  is  very  useful 
in  a  nested  analysis  run,  where  the  nested  grids  telescope  down  to  the  desired  forecast 
model  grid  resolution. 

SST  observations  are  stratified  further  by  water  mass  in  western  boundary  current 
regions.  The  water  mass  classification  is  based  on  Bayes  rule,  given  here  in  density  form: 


p(i  |  *)  =  p{ x  |  i)p(i)/p(x) 
p(x  |  i)  =  N(x  |  to,  (ij ), 


(9) 
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where  p(i  |  Jt)  is  the  a  posteriori  probability  of  water  mass  i,  given  temperature  x , 
p(i)  is  the  a  priori  probability  of  occurrence  of  the  ith  water  mass,  and  p(x  |  r)  is 
the  conditional  probability  of  temperature  x  given  water  mass  i ,  which  is  a  normal 
probability  distribution  function  with  parameters  (/a,-,  07).  The  Bayesian  water  mass 
classification  statistics  are  computed  off-line  from  analyses  of  temperature  frequency 
distributions  using  monthly  datasets  of  Advanced  Very  High  Resolution  Radiometer 
(AVHRR)  SST  retrievals.  The  frequency  distribution  analysis  computes  the  best  fit 
of  a  mixture  of  normal  probability  distribution  functions  that  describe  the  observed, 
polymodal  frequency  distribution.  The  number  of  modes  (water  masses),  and  the 
maximum  likelihood  estimates  of  the  means  (/*,■)  and  standard  deviations  (oy)  of  the 
composite  normal  distributions  in  the  mixture,  provide  the  necessary  classification 
statistics.  Figure  4  shows  the  SST  frequency  distribution  for  the  month  of  July  in  the 
Brazil-Malvinas  confluence  area  with  an  overlay  of  the  best-fit  mixture  of  normal  prob¬ 
ability  distributions,  The  water  mass  classification  decision  boundaries  are  the  local 
minimums  of  the  overlapping  normal  probability  distributions.  Figure  4  also  shows 
the  geographic  distribution  of  water  mass  classified  Advanced  Microwave  Scanning 
Radiometer  (AMSR)-E  microwave  SST  observations  for  16-17  July  2005.  This  inde¬ 
pendent,  synoptic  SST  dataset  clearly  shows  the  boundaries  of  the  narrow  ocean  fronts 
and  associated  mesoscale  eddies.  The  water  mass  classification  of  SST  observations 
prevents  the  averaging  of  dissimilar  observations  across  ocean  fronts  and  eddies  during 
the  formation  of  the  super-observations.  This  process  helps  to  maintain  horizontal  SST 
gradients  in  the  analysis.  In  addition  to  the  Brazil-Malivinas  confluence,  water  mass 
classification  statistics  have  also  been  computed  for  the  Agulhas  Current,  Gulf  Stream, 
Kursohio  Extension,  Sea  of  Japan,  and  East  Australian  Current  regions. 

Pre-processing  options  on  the  profile  data  include:  (i)  specification  of  selection 
criteria  to  ensure  adequate  sampling  in  the  vertical,  (ii)  vertical  extension  of  the  profiles 
from  the  last  observed  depth  to  the  bottom  using  the  first-guess  background  field,  and 
(iii)  assimilation  of  profile  observations  at  observed  levels  or  after  vertical  interpolation 
to  model  levels. 


(c)  Altimeter  sea  surface  height  assimilation 

From  Table  1 ,  it  is  apparent  that  most  ocean  observations  are  remotely  sensed 
and  measure  ocean  variables  only  at  the  surface.  The  lack  of  synoptic  real-time  data 
at  depth,  places  severe  limitations  on  the  ability  of  the  data  assimilation  system  to 
resolve  and  maintain  an  adequate  representation  of  the  ocean  mesoscale.  Sub-surface 
properties  in  the  ocean,  therefore,  must  be  inferred  from  surface-only  observations.  The 
most  important  observing  system  for  this  purpose  is  satellite  altimetry,  which  provides 
measures  of  the  time  varying  change  in  SSH  in  the  form  of  anomalies  from  an  8-year 
repeat  track  mean.  Changes  in  SSH  are  strongly  correlated  with  changes  in  the  depth  of 
the  thermocline  in  the  ocean,  and  the  ocean  dynamics  generating  SSH  changes  are  the 
mesoscale  eddies  and  meandering  ocean  fronts.  Two  alternative  methods  can  be  used 
in  the  analysis  to  assimilate  satellite  altimeter  observations  of  SSHA.  In  neither  method 
is  the  altimeter  data  directly  used  to  change  the  background  fields.  Rather,  altimeter- 
derived  synthetic  profile  observations  are  computed,  and  combined  in  the  analysis  with 
in  situ  observations  and  the  model  background,  which  requires  specifying  observation 
errors  of  the  synthetic  profiles. 

One  approach  is  the  assimilation  of  synthetic  temperature  profiles  computed  using 
the  Modular  Ocean  Data  Assimilation  System  (MODAS)  database,  MODAS  models  the 
time-averaged  co-variability  of  SSH  and  temperature  at  depth  at  a  fixed  location  (Fox 
et al.  2002),  Regression  coefficients  derived  from  the  historical  profile  archive  have  been 
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Figure  4.  (a)  Observed  (histogram)  and  fitted  (solid  curve)  frequency  distribution  of  AVHRR  global  area 

coverage  satellite  sea  surface  temperature  (SST)  retrievals  in  the  Brazil-Mat  vinas  confluence  region  for  the  month 
of  July,  Frequency  distribution  binning  interval  is  0,25  degC.  (b)  Geographic  distribution  of  water  mass  classified 
AMSR-E  microwave  satellite  SST  retrievals  for  16-17  July  2005-  Water  mass  classification  numbers  are  defined 
from  cold  to  warm  temperature  modes  in  the  filled  frequency  distribution.  The  binning  interval  is  0.25  degC  and 
the  frequencies  plotted  have  been  normalized  to  range  from  0  to  1  for  display  purposes. 
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computed  that  relate  steric  height  anomalies  to  climate  temperature  anomalies  at  depth. 
The  error  variances  of  the  MODAS  synthetic  profiles  in  the  analysis  depend  upon  the 
accuracy  of  the  SSH  predictor  field  and  upon  the  magnitude  of  the  residual  errors  of 
the  regressions  relating  steric  height  and  temperature  at  depth.  MODAS  residual  errors 
vary  with  location,  depth  and  time  of  year,  and  are  generally  small  in  western  boundary 
current  regimes  where  steric  height  anomalies  are  highly  correlated  with  changes  in  the 
depth  of  the  thermocline.  MODAS  has  marginal  skill  in  other  areas  of  the  world’s  oceans 
due  to:  (i)  sampling  limitations  of  the  historical  profile  data,  (ii)  non-steric  signals  in  the 
altimeter  data,  and  (iii)  weak  correlations  between  steric  height  and  temperature  at  depth 
due  to  other  factors,  such  as  the  influence  of  salinity  on  steric  height  at  high  latitudes  and 
in  eastern  boundary  regions.  Salinity  profiles  are  computed  from  the  synthetic  MODAS 
temperature  profiles  in  a  subsequent,  separate  step  as  described  in  subsection  4(d).  The 
MODAS  synthetic  temperature  and  derived  salinity  profile  method  is  primarily  used 
when  the  analysis  system  is  cycling  on  itself. 

A  second  and  alternative  approach  is  the  direct  assimilation  of  observed  SSH 
changes  using  a  modified  form  of  the  method  developed  by  Cooper  and  Haines  (1996). 
In  this  approach,  the  model  forecast  density  field  is  adjusted  to  correct  errors  between 
the  model  height  field  and  the  height  field  measured  by  the  altimeter.  The  adjustments 
are  computed  by: 


(10) 


where  Aps  is  the  surface  pressure  change  measured  by  the  difference  between  the 
model  background  and  the  satellite  altimeter  observation,  A pz  is  the  change  in  density 
at  level  z,  g  is  the  gravitational  constant,  and  A/jh  is  the  bottom  pressure  change. 
The  depth  range  of  the  density  corrections  is  constrained  to  be  between  the  mixed  layer 
(z,)  and  a  level  of  no  motion  (zh)  where  Apt,  is  assumed  to  be  zero.  Output  of  the 
integration  is  in  the  form  of  innovations  of  temperature  and  salinity  from  the  background 
field.  The  observation -error  variances  of  the  temperature  and  salinity  innovations  are 
computed  as  the  sum  of  the  respective  background-error  variance  plus  the  residual 
error  from  the  iterative  fit  of  the  density  adjustments  to  the  observed  change  in  SSH. 
An  advantage  of  this  direct  method  over  the  MODAS  method  is  that  it  relies  on 
model  dynamics  for  a  priori  information  rather  than  statistics  fixed  at  the  start  of  the 
assimilation.  Furthermore,  the  method  computes  adjustments  to  the  model  temperature 
and  salinity  profiles  simultaneously.  As  a  result,  it  does  not  introduce  spurious  water 
masses  into  the  model.  A  disadvantage  is  that  the  direct  method  cannot  explicitly  correct 
for  errors  in  the  stratification  or  long-term  drift  of  the  water  mass  characteristics  in 
the  model.  Also  there  are  difficulties  applying  the  direct  method  in  weakly  stratified 
conditions  (Fox  et  al.  2000)  which  occur  primarily  at  high  latitudes.  Accordingly, 
altimeter  SSHAs  are  scaled  to  zero  at  latitudes  greater  than  65°.  The  direct  method 
is  only  used  when  the  analysis  is  cycled  with  an  ocean  forecast  model. 

While  having  the  potential  of  adding  important  information  in  data-sparse  areas, 
the  number  of  altimeter-derived  synthetic  observations  computed  can  greatly  exceed 
and  overwhelm  the  in  situ  observations  in  the  analysis.  Accordingly,  the  synthetic 
observations  are  thinned  prior  to  the  analysis  in  three  steps.  In  the  first  step,  it  is 
assumed  that  directly  observed  temperature  and  salinity  profiles  are  a  more  reliable 
source  of  subsurface  information  wherever  such  observations  exist.  Altimeter-derived 
subsurface  profiles,  therefore,  are  not  generated  in  the  surrounding  area  of  an  in  situ 
profile  observation  defined  by  the  local  correlation  length-scale.  In  the  second  step, 
the  analysed  incremental  change  in  SSH  measured  by  the  altimeter  must  exceed  a 
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Figure  5.  Synthetic  profile  sampling  pattern  for  a  global  three-dimensional  multivariate  optimum  interpolation 
analysis  valid  21  October  2005:  (a)  altimeter  sea  surface  height  anomaly  (SSHA)  analysts  increments  (colour 
filled  contours)  with  synthetic  profile  locations  (dots)  in  the  Agulhas  current  subdomain  of  the  global  grid  where 
the  absolute  value  of  the  SSHA  increment  exceeds  4  cm;  (b)  global  locations  of  synthetic  profiles  generated 
from  SSHA  increments.  A  total  of  3862  synthetic  profile  locations  were  sampled*  which  created  99210  synthetic 
temperature-salinity  depth  observations  in  the  analysis. 


threshold  value,  defined  as  the  noise  level  of  the  satellite  altimeters,  to  trigger  the 
generation  of  a  synthetic  observation.  Finally,  in  the  third  step,  local  correlation  length- 
scales  are  used  to  control  the  density  of  the  synthetic  profiles  within  the  contours  of  SSH 
change  that  exceed  the  prescribed  noise-level  threshold.  An  example  of  the  synthetic 
profile  sampling  pattern  generated  in  a  global  MVOI  analysis  is  shown  in  Fig.  5. 
Figure  5(a)  shows  the  synthetic  profile  locations  in  relation  to  the  analysed  SSHA 
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increments  in  the  vicinity  of  the  Agulhas  current,  using  a  noise-level  threshold  value 
of  4  cm;  Fig.  5(b)  shows  the  global  distribution  of  altimeter-derived  synthetic  profiles 
generated  primarily  in  western  boundary  current  regions  and  the  Antarctic  circumpolar 
current,  where  the  daily  change  in  SSHA  is  large  due  to  mesoscale  variability. 

(d)  Salinity  observations 

In  the  multivariate  analysis,  observed  temperatures  must  have  a  companion  salinity 
observation  in  the  computation  of  the  geopotential  observations.  Salinity  is  routinely 
measured  by  some  observing  systems,  such  as  Argo  floats  and  shipboard  conductivity- 
temperature-depth  (CTD)  profiles,  but  other  observing  systems  measure  only  tempera¬ 
ture.  For  these  observing  systems,  salinity  is  estimated  from  the  observed  temperature 
using  temperature-salinity  regression  relationships  derived  from  the  historical  profile 
archive  and  stored  in  the  MODAS  database.  The  observation -error  variance  of  the 
derived  salinity  values  is  estimated  from  the  residuals  of  the  MODAS  regressions,  which 
vary  in  both  space  and  time.  MODAS-derived  salinities  have  good  skill  at  depths  where 
the  temperature-salinity  relationship  is  well  defined  and  the  historical  profile  archive 
is  adequate,  as  shown  by  independent  verification  against  Argo  and  shipboard  CTD 
profiles.  Near  the  surface  this  relationship  breaks  down,  and  the  derived  salinities  tend 
to  reflect  the  climate  mean  that  is  used  as  the  basis  function  in  the  regressions.  However, 
the  near-surface  regression  residuals,  and  thus  the  observation  errors,  are  also  large,  with 
the  result  that  MODAS-derived  salinity  observations  near  the  surface  carry  little  weight 
in  the  analysis. 


5.  Validation  and  verification 

Residual  vectors  {y  -  H(xa)l  are  routinely  computed  for  each  analysis  variable. 
The  residuals  and  the  innovations  for  all  observations  assimilated  are  saved  at  the  end 
of  each  update  cycle  in  an  innovation-vector  file.  When  cycling  with  an  ocean  forecast 
model  the  analysis  system  also  saves  forecast  field  values  at  the  observation  locations 
for  all  forecast  periods  that  are  multiples  of  the  analysis  update  cycle  time  interval. 
These  observation  innovation-  and  forecast-vector  files  allow  rapid  assessment  of  the 
impact  of  the  assimilation  on  the  skill  of  the  forecast,  which  is  a  useful  diagnostic  tool 
for  determining  the  performance  of  the  analysis.  A  time  history  of  the  innovations  and 
the  residuals  is  the  basic  information  needed  to  compute  a  posteriori  refinements  to  the 
statistical  parameters  required  in  the  MVOI.  Statistical  analysis  of  the  innovations  is 
the  most  common,  and  the  most  accurate,  technique  for  estimating  observation-  and 
forecast-error  covariances,  and  the  method  has  been  successfully  applied  in  practice 
(e.g.  Hollingsworth  and  Lonnberg  1986).  Examination  of  the  residual  vectors  is  also 
very  useful  in  assessing  the  fit  of  the  analysis  to  specific  observations  or  observing 
systems.  A  spatial  autocorrelation  analysis  of  the  residuals  is  used  to  determine  if 
the  analysis  has  extracted  all  of  the  information  in  the  observing  system.  Any  spatial 
correlation  remaining  at  spatial  lags  greater  than  zero  represents  information  that  has 
not  been  extracted  by  the  analysis  and  indicates  an  inefficient  analysis.  Figure  6  shows 
the  results  of  a  binned  average,  spatial  autocorrelation  analysis  of  the  innovations  and 
residuals  of  drifting  buoy  SST  observations  for  April  2005  in  the  global  3D  MVOI 
analysis.  The  spatial  averaging  bins  are  determined  by  the  resolution  of  the  analysis  grid. 
A  SOAR  model  is  a  good  fit  to  the  innovation  autocorrelations,  and  the  analysis  residuals 
are  uncorrelated  at  all  spatial  lags  greater  than  one.  However,  the  positive  correlation 
in  the  first  spatial  bin  of  (he  residual  autocorrelation  indicates  that  the  analysis  does 
not  fit  the  data  to  within  the  specified  observational  error  limits  and  is  suboplima! 
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Figure  6.  (a)  Spatial  autocorrelation  analysis  of  drifting- buoy  sea  surface  temperature  (SST)  innovations  and 

residuals  from  global  three-dimensional  multivariate  optimum  interpolation  analyses  for  April  2005.  Innovation 
autocorrelations  are  marked  X  and  residual  autocorrelations  are  marked  O.  The  solid  line  is  the  I  cast -squares  fit 
of  a  second  order  auto- regressive  model  to  the  innovation  autocorrelation  function,  (b)  Locations  of  the  drifting 
buoy  SST  observations  used  in  the  autocorrelation  analysis. 


(Hollingsworth  and  Lonnberg  1989).  The  source  of  this  discrepancy  in  the  background- 
error  covariance  modelling  is  under  active  investigation. 

A  menu-driven  diagnostic  package  has  been  developed  to  allow  end  users  to  moni¬ 
tor  both  the  quality  and  the  performance  aspects  of  the  analysis  system.  Example  output 
from  the  diagnostic  package  is  shown  in  Fig.  7,  which  gives  a  time  series  of  the  daily 
temperature  climate,  innovation,  and  residual  RMS  and  mean  bias  errors  for  the  global 
MVOI  analysis.  The  analysis  was  started  from  climatology  on  20  June  2005  and  cycled 
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Figure  7.  Daily  residual,  innovation  and  climate  temperature  RMS  error  and  mean  bias  error  statistics  for  global 
three-dimensional  multi  variate  optimum  interpolation  analyses  from  20  June  to  1  November  2005 ,  All  sources  of 
temperature  observations  arc  included  in  the  error  calculations.  Statistics  are  shown  for  the  upper  800  m  of  the 
water  column.  The  contour  interval  is  0.1  dcgC.  Each  tick  mark  along  the  horizontal  axes  of  the  bottom  panels 

represents  a  daily  analysis  update  cycle. 


for  135  days  using  a  24-hour  update  cycle.  On  average,  —  1 800  in  situ  profiles  are 
assimilated  each  day,  along  with  -“3500  MODAS  synthetic  profiles  generated  to  capture 
daily  SSHA  changes  measured  by  the  altimeters  using  the  sampling  scheme  described 
in  subsection  4(c).  The  verification  time  series  shows  a  consistent  pattern  of  reduction  in 
RMS  error  from  climatology  to  the  previous  analysis  (innovations).  A  further  reduction 
in  RMS  error  is  also  seen  from  the  innovations  to  the  analysis  residuals.  Residual  and 
innovation  mean  biases  are  close  to  zero,  other  than  during  a  period  in  early  Septem¬ 
ber  when  the  transmission  of  satellite  SST  and  altimeter  data  from  NAVOCEANC)  was 
interrupted  due  to  Hurricane  Katrina.  The  innovation-error  plots  show  that  the  analysis 
quickly  recovered  once  the  data  transfers  resumed.  Residual  RMS  errors  are  a  maxi¬ 
mum  at  depths  corresponding  to  high  vertical  gradients  in  the  global  ocean,  which  is 
consistent  with  the  specification  of  the  profile  representation  errors  described  previ¬ 
ously.  These  results  demonstrate  a  robust  analysis,  and  effective  use  of  the  operational 
observing  systems  in  the  analysis. 
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Figure  8,  Time  series  of  the  Jmm  statistic  (see  section  5)  for  temperature  profiles  assimilated  in  global  Lhree- 
dimensional  multivariate  optimum  interpolation  analyses  from  12  March  to  19  May  2005.  The  temperature 
profiles  assrmi latcd  include  all  in  situ  observations,  plus  synthetic  temperature  profiles  computed  from  altimeter 
SSHA  analyses  using  the  MQDAS  method  (see  subsection  4(c)).  The  temperature  profile  representation  error  was 

adjusted  on  May  L 


The  consistency  of  the  specified  error  variances  with  the  innovation  vector  is 
estimated  by  the  a  posteriori  ymjn  diagnostic  (Daley  and  Barker  2001).  When  normal¬ 
ized  by  the  number  of  observations,  the  expected  value  of  the  /mill  diagnostic  is  one. 
imin  values  not  equal  to  one  indicate  that  either  the  background  and/or  observation 
covariances  are  specified  incorrectly,  or  that  erroneous  observations  are  being  assimi¬ 
lated.  The  ymin  diagnostic  is  routinely  computed  for  all  observing  systems  and  analysis 
variables  at  each  update  cycle.  While  all  aspects  of  the  analysis  system  affect  /min. 
it  is  assumed  in  the  adaptive  scheme  implemented  here  that  any  discrepancy  of  ymjn 
with  respect  to  its  expected  value  is  due  to  incorrect  specification  of  the  error  variances. 
Accordingly,  a  simple  scalar  is  used  in  the  analysis  to  increase  or  decrease  the  observa¬ 
tion  representation  and/or  background-error  variances.  By  monitoring  the  ymin  diagnos¬ 
tic  in  subsequent  executions  of  the  analysis,  it  can  be  determined  if  the  error  variance 
scaling  produces  an  appropriate  response.  For  example.  Fig.  8  shows  a  2-month  time 
series  of  daily  ymjn  diagnostics  computed  for  temperature  profile  observations  from  the 
global  3D  MVOI  analysis.  Assuming  that  ymjn  values  less  than  one  are  due  to  an  incor¬ 
rect  specification  of  the  temperature  observation  error  variance,  the  temperature  profile 
representation  error  was  decreased  on  I  May.  The  ymjn  diagnostic  exhibited  an  immedi¬ 
ate  overshoot,  but  subsequent  update  cycles  show  the  statistic  approaching  the  expected 
value  of  one.  It  should  be  noted,  however,  that  inferences  made  from  the  yn,jn  diagnostic 
require  very  large  sample  sizes  from  many  time  integrations  of  the  assimilation  system. 


6.  Operations  and  future  applications 

As  mentioned  in  the  introduction,  the  NCODA  multivariate  analysis  is  running 
in  an  analysis-only,  real-time  operational  mode  at  the  US  Navy  oceanographic  pro¬ 
duction  centres.  At  FNMOC,  global  9-km  resolution  SST  and  sea  ice  analyses  are 
being  produced  as  a  contribution  to  the  GODAE  High  Resolution  SST  pilot  project 
(GHRSST-PP).  The  GHRSST-PP  analyses  are  executed  using  a  6-hour  update  cycle  and 
are  available  within  6  hours  of  real-time.  In  addition,  a  global  3D  MVOI  analysis  exe¬ 
cuted  at  18-km  resolution  is  being  produced  by  FNMOC.  The  3D  analysis  assimilates 
the  same  suite  of  observations  as  the  GHRSST-PP  product,  plus  in  situ  temperature  and 
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salinity  profiles  from  Argo  floats,  shipboard  CTD  and  expendable  bathythermograph 
(XBT)  casts,  thermistor  chain  fixed  and  drifting  buoys,  and  satellite  altimeter  SSHA 
observations  assimilated  via  the  MODAS  synthetic  profile  approach.  The  global  3D 
MVOI  analysis  is  executed  using  a  24-hour  update  cycle,  and  the  analysis  products  are 
available  within  9  hours  of  real-time.  The  global  analysis  is  being  used  to  evaluate  the 
impact  of  the  assimilation  of  Argo  profiles  and  satellite  altimeter  data  as  a  contribution 
to  GODAE.  The  GHRSST-PP  and  the  global  3D  MVOI  analyses  are  available  on  the 
US  GODAE  server  at  http://www.usgodae.org. 

Two  new  applications  of  NCODA  are  under  development.  First,  NCODA  is  being 
implemented  as  the  data  assimilation  component  of  the  HYCOM  ocean  forecast  model 
as  part  of  a  US  contribution  to  GODAE.  This  system  will  initially  be  implemented 
in  basin-scale  analysis/forecast  modelling  systems  in  the  North  Atlantic  and  North 
Pacific,  with  the  ultimate  aim  of  a  fully  global  implementation  soon  thereafter.  The  layer 
structure  of  HYCOM  presents  new  challenges  to  the  assimilation  system.  A  new 
analysis  variable  has  been  added  that  computes  the  correction  of  the  model  forecast 
isopycnal  layer  pressures;  these  are  needed  to  match  the  density  profile  computed  from 
the  observed  temperature  and  salinity.  The  correction  moves  model  layers  at  their  target 
density  to  the  new  pressure  levels  subject  to  a  series  of  constraints  that  are  applied 
after  the  analysis  in  a  model  initialization  step  (e.g.  hydrostatic  stability,  minimum  layer 
thickness).  Evaluation  of  the  skill  of  the  HYCOM  forecasts  issued  from  the  analyses 
is  ongoing.  The  second  application  of  NCODA  is  the  development  of  an  ensemble 
forecasting  capability  for  application  in  limited-area  modelling.  The  forecast  ensembles 
are  generated  by  a  space-time  deformation  of  the  atmospheric  forcing  fields  as  well 
as  a  perturbation  of  the  ocean  model  initial  conditions  using  the  ensemble  transform 
technique  (Bishop  and  Toth  1999).  The  ensemble  of  forecast  perturbations  is  then 
transformed  to  an  ensemble  of  analysis  perturbations  using  the  ensemble  transform 
Kalman  Filter  (ETKF;  Bishop  et  at.  2001).  Flow-dependent  covariances  are  derived 
from  the  ETKF  and  directly  input  into  the  multivariate  analysis  for  the  next  control  run 
using  a  hybrid  error-covariance  formulation  (Etherton  and  Bishop  2004).  The  ensemble 
system  will  be  used  in  an  adaptive  sampling,  targeted  observation  application  with  ocean 
gliders,  Ocean  gliders  provide  up  and  down  profiles  of  temperature  and  salinity  and 
other  variables,  and  the  path  of  a  glider  through  the  water  column  can  be  controlled 
from  the  surface.  The  ETKF  determines  the  way-points  for  the  next  dive  of  the  glider 
and  the  set  of  glider  profile  observations  that  minimize  the  ocean  model  forecast  error 
in  a  predefined  verification  area. 
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